-
Notifications
You must be signed in to change notification settings - Fork 284
Iscp integration, hnsw update improvement #22594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue ##21835
What this PR does / why we need it:
load the models once and perform all batches from DataRetriever and finally save model files into database.
This changes save a lot of time for upload/download the model every time there is a 8192 vector block.
PR Type
Enhancement, Tests
Description
• ISCP Integration: Implemented comprehensive Index Sync Consumer Producer (ISCP) system for asynchronous index management with CDC (Change Data Capture) support
• HNSW Update Improvements: Refactored HNSW synchronization from function-based to struct-based architecture with
HnswSyncfor better modularity and async operations• SQL Process Abstraction: Introduced
SqlProcessabstraction layer replacingprocess.Processacross vector index operations for background SQL execution• Async Index Support: Added full async support for HNSW, IVF, and fulltext indexes with CDC task creation and management
• DDL Integration: Enhanced DDL operations (CREATE, ALTER, DROP) with ISCP job registration and cleanup for vector and fulltext indexes
• Performance Optimizations: Replaced JSON library with sonic for better performance and added thread management to HNSW models
• Comprehensive Testing: Added extensive test coverage for async index operations including HNSW f64, IVF, and fulltext indexes
• Transaction Management: Refactored consumers to use direct transaction management with proper timeout configurations
Diagram Walkthrough
File Walkthrough
2 files
sync.go
Refactor HNSW CDC sync to struct-based async architecturepkg/vectorindex/hnsw/sync.go
• Refactored
CdcSyncfunction into a struct-based approach withHnswSyncstruct• Separated CDC operations into
Update,Save, andRunOncemethods for better modularity• Added
DownloadAllmethod forpre-downloading index models
• Changed function signatures to use
sqlexec.SqlProcessinstead ofprocess.Processmock_consumer.go
Refactor ISCP consumer for direct transaction managementpkg/iscp/mock_consumer.go
• Updated internal SQL consumer to use direct transaction management
•
Added engine and transaction client dependencies for background
operations
• Modified consume methods to work with
client.TxnOperatorinstead of
executor.TxnExecutor• Enhanced error handling and
transaction lifecycle management
23 files
sync_test.go
Update HNSW sync tests for new struct-based APIpkg/vectorindex/hnsw/sync_test.go
• Updated all test functions to use new
HnswSyncstruct API• Changed
mock functions to accept
sqlexec.SqlProcessinstead ofprocess.Process• Added new test for continuous update operations with small capacity
• Modified test setup to create
SqlProcessinstancesindex_consumer_test.go
Enhance ISCP consumer tests with IVF index supportpkg/iscp/index_consumer_test.go
• Added support for IVF index testing with new table definitions and
consumer info
• Replaced mock SQL executor with stub-based approach
using
ExecWithResult• Added comprehensive tests for IVF snapshot and
tail operations
• Updated HNSW consumer tests to use new SQL writer
approach
cache_test.go
Update vector index cache tests for SQL process abstractionpkg/vectorindex/cache/cache_test.go
• Updated all cache test methods to use
sqlexec.SqlProcessinstead ofprocess.Process• Modified mock search implementations to accept new
SQL process parameter
• Added
SqlProcesscreation in all testfunctions
model_test.go
HNSW model tests updated for SqlProcess interfacepkg/vectorindex/hnsw/model_test.go
• Updated test functions to use
sqlexec.SqlProcessinstead ofprocess.Process• Modified mock functions to accept
SqlProcessparameter
• Updated all
LoadMetadata,LoadIndex, andLoadIndexFromBuffercalls to use new interfaceiscp_util_test.go
ISCP utility functions test suite implementationpkg/sql/compile/iscp_util_test.go
• Added comprehensive test suite for ISCP utility functions
•
Implemented mock functions for testing error scenarios in job
registration/unregistration
• Added tests for index CDC validation,
task creation, and deletion operations
search_test.go
IVF-flat search tests updated for SqlProcess interfacepkg/vectorindex/ivfflat/search_test.go
• Updated test mock functions to use
sqlexec.SqlProcessinstead ofprocess.Process• Modified test setup to create
SqlProcesswrapperaround
process.Process• Updated all search function calls to use new
interface
search_test.go
HNSW search tests updated for SqlProcess interfacepkg/vectorindex/hnsw/search_test.go
• Updated mock functions to use
sqlexec.SqlProcessparameter•
Modified test setup to create
SqlProcesswrapper• Updated cache
search calls to use new interface
build_test.go
HNSW build tests updated for SqlProcess interfacepkg/vectorindex/hnsw/build_test.go
• Updated test functions to use
sqlexec.SqlProcesswrapper• Modified
NewHnswBuildcalls to use new interfaceivf_search_test.go
IVF search table function tests updatedpkg/sql/colexec/table_function/ivf_search_test.go
• Updated mock functions and test interfaces to use
sqlexec.SqlProcess• Modified mock search implementation to work with new interface
build_dml_util_test.go
DML utility async index tests addedpkg/sql/plan/build_dml_util_test.go
• Added test cases for async fulltext index functions
• Implemented
tests for invalid JSON and async flag validation
fulltext_test.go
Fulltext table function tests updatedpkg/sql/colexec/table_function/fulltext_test.go
• Updated mock functions to use
sqlexec.SqlProcessparameter•
Modified test setup to work with new interface
hnsw_search_test.go
HNSW search table function tests updatedpkg/sql/colexec/table_function/hnsw_search_test.go
• Updated mock search implementation to use
sqlexec.SqlProcess•
Modified test interfaces for new SqlProcess parameter
hnsw_create_test.go
HNSW create table function tests updatedpkg/sql/colexec/table_function/hnsw_create_test.go
• Updated mock function to use
sqlexec.SqlProcessparameter• Modified
test setup for new interface
sqlexec_test.go
SQL execution tests updated for SqlProcesspkg/vectorindex/sqlexec/sqlexec_test.go
• Updated tests to use
NewSqlProcesswrapper around process• Modified
RunTxncalls to use SqlProcess interfacetypes_test.go
Vector index types test updated for capacity parameterpkg/vectorindex/types_test.go
• Updated test to pass capacity parameter to
NewVectorIndexCdcvector_hnsw_f64_async.sql
Async HNSW vector index integration teststest/distributed/cases/vector/vector_hnsw_f64_async.sql
• Added comprehensive test cases for async HNSW index functionality
•
Includes tests for insert, update, delete operations with CDC
synchronization
• Tests vector similarity search after async index
updates
fulltext_async.result
Async fulltext index test resultstest/distributed/cases/fulltext/fulltext_async.result
• Added expected results for async fulltext index tests
• Shows
successful async index creation and search functionality
fulltext_async.sql
Async fulltext index integration teststest/distributed/cases/fulltext/fulltext_async.sql
• Added test cases for async fulltext index functionality
• Tests
index creation, data insertion, search, and table operations
vector_ivf_async.result
New test results for asynchronous IVF vector indexingtest/distributed/cases/vector/vector_ivf_async.result
• Added new test result file for asynchronous IVF (Inverted File)
vector index functionality
• Contains test results for creating IVF
indexes with
ASYNCkeyword and vector similarity queries• Includes
tests for
experimental_ivf_indexsetting, table creation, indexcreation, and L2 distance queries
• Tests both small datasets and
larger datasets loaded from CSV files with sleep delays for async
operations
vector_hnsw.result
Updated HNSW test results for async index operationstest/distributed/cases/vector/vector_hnsw.result
• Added
sleep(30)call and corresponding result to allow time forasynchronous index building
• Updated query results to show proper
vector similarity search results after async index completion
• Added
drop table vector_index_01statement and minor precision changes incosine distance values
vector_ivf_async.sql
New test cases for asynchronous IVF vector indexingtest/distributed/cases/vector/vector_ivf_async.sql
• New test file for asynchronous IVF vector index functionality
•
Tests
experimental_ivf_indexsetting, table creation with vectorcolumns, and async index creation
• Includes vector similarity queries
using
L2_DISTANCEfunction with sleep delays for async operations•
Tests both small manual datasets and larger CSV file imports with
async index building
vector_hnsw_f64_async.result
New test results for asynchronous HNSW f64 vector indexingtest/distributed/cases/vector/vector_hnsw_f64_async.result
• Added new test result file for asynchronous HNSW vector indexing
with 64-bit float vectors
• Contains test results for CRUD operations
(insert, update, delete) with async HNSW indexes
• Includes vector
similarity search results using
L2_DISTANCEwith sleep delays forasync operations
• Tests both small datasets and larger CSV file
imports with async index building
vector_hnsw.sql
Updated HNSW test cases for async index operationstest/distributed/cases/vector/vector_hnsw.sql
• Added
sleep(30)call to allow time for asynchronous index buildingbefore queries
• Updated comment from "no result found" to "async
index update so result found"
• Added
drop table vector_index_01statement for proper test cleanup
26 files
alter.go
Integrate ISCP job management in table alteration workflowpkg/sql/compile/alter.go
• Added ISCP job management during table alteration operations
•
Implemented index CDC task creation for unaffected indexes during copy
operations
• Enhanced clone logic to handle index table information
with unique and algorithm details
• Added proper cleanup of ISCP jobs
for temporary tables
model.go
Enhance HNSW model with thread management and SQL process supportpkg/vectorindex/hnsw/model.go
• Added
NThreadfield toHnswModelstruct for thread management•
Refactored index initialization into separate
initIndexmethod•
Enhanced error handling in index creation and loading operations
•
Updated method signatures to use
sqlexec.SqlProcessinstead ofprocess.Processsqlexec.go
Add SQL execution abstraction for background operationspkg/vectorindex/sqlexec/sqlexec.go
• Introduced
SqlContextandSqlProcessabstractions for background SQLexecution
• Added support for running SQL operations without frontend
process context
• Implemented
RunTxnWithSqlContextfor backgroundtransaction management
• Enhanced SQL execution functions to work with
both frontend and background contexts
ddl.go
Integrate ISCP job management across DDL operationspkg/sql/compile/ddl.go
• Added ISCP job registration and management for various DDL
operations
• Implemented index CDC task creation for vector and
fulltext indexes
• Added cleanup of ISCP jobs during database and
table drops
• Enhanced IVF index handling with async support and ISCP
integration
index_consumer.go
ISCP index consumer refactoring with async HNSW supportpkg/iscp/index_consumer.go
• Refactored
IndexConsumerto useengine.Engineandclient.TxnClientinstead of
executor.SQLExecutor• Added separate execution paths for
HNSW and other index types with
runHnswandrunIndexfunctions•
Implemented transaction management using
sqlexec.RunTxnWithSqlContextwith different timeout configurations
• Added support for JSON-based
CDC data processing for HNSW indexes using
sonic.Unmarshalsearch.go
IVF-flat search interface updated to SqlProcesspkg/vectorindex/ivfflat/search.go
• Replaced
process.Processwithsqlexec.SqlProcessthroughout thesearch implementation
• Updated context handling to use
sqlproc.GetContext()instead ofproc.Ctx• Modified function
signatures for
LoadIndex,Search,findCentroids, andsearchEntriesmethods
build_dml_util.go
DML utility functions enhanced with async index supportpkg/sql/plan/build_dml_util.go
• Added async index checks using
catalog.IsIndexAsync()to skipsynchronous processing
• Updated comment formatting for better
readability in delete plans documentation
• Added early returns for
async indexes in multiple index building functions
search.go
HNSW search interface updated to SqlProcesspkg/vectorindex/hnsw/search.go
• Replaced
process.Processwithsqlexec.SqlProcessin search interface• Updated
LoadMetadata,LoadIndex, andSearchmethod signatures•
Modified context handling to use
sqlproc.GetContext()cache.go
Vector index cache updated for SqlProcess interfacepkg/vectorindex/cache/cache.go
• Updated
VectorIndexSearchIfinterface to usesqlexec.SqlProcessinstead of
process.Process• Modified
LoadandSearchmethodsignatures throughout the cache implementation
• Updated all cache
operations to work with new SqlProcess interface
ddl_index_algo.go
DDL index algorithms enhanced with async supportpkg/sql/compile/ddl_index_algo.go
• Added async index support for fulltext and HNSW indexes
•
Implemented CDC task creation for async indexes instead of direct SQL
execution
• Added conditional logic to handle both sync and async
index creation workflows
index_sqlwriter.go
HNSW SQL writer refactored for JSON-based CDCpkg/iscp/index_sqlwriter.go
• Modified HNSW SQL writer to return JSON data instead of SQL
statements
• Added
NewSyncmethod to createHnswSyncinstances•
Updated CDC writer capacity configuration
ivf_search.go
IVF search table function updated for SqlProcesspkg/sql/colexec/table_function/ivf_search.go
• Updated IVF search table function to use
sqlexec.NewSqlProcess(proc)• Modified
getVersionand cache search calls to use SqlProcessinterface
build.go
HNSW build interface updated to SqlProcesspkg/vectorindex/hnsw/build.go
• Updated
NewHnswBuildand related methods to usesqlexec.SqlProcess•
Modified context handling to use
sqlproc.GetContext()hnsw_search.go
HNSW search table function updated for SqlProcesspkg/sql/colexec/table_function/hnsw_search.go
• Updated HNSW search table function to use
sqlexec.NewSqlProcess(proc)• Modified cache search call to use
SqlProcess interface
data_retriever.go
Data retriever watermark update refactoredpkg/iscp/data_retriever.go
• Updated
UpdateWatermarkmethod to use direct SQL execution withtransaction
• Replaced executor interface with context, cnUUID, and
txn parameters
• Added proper timeout and system account context
handling
types.go
Vector index types updated with sonic JSON librarypkg/vectorindex/types.go
• Replaced
jsonpackage withsonicfor better performance• Updated
NewVectorIndexCdcto accept capacity parameter• Modified JSON
marshaling to use sonic library
hnsw_create.go
HNSW create table function updated for SqlProcesspkg/sql/colexec/table_function/hnsw_create.go
• Updated HNSW creation functions to use
sqlexec.NewSqlProcess(proc)•
Modified
NewHnswBuildcalls to use SqlProcess interfaceconsumer.go
Consumer factory updated with engine and transaction clientpkg/iscp/consumer.go
• Updated
NewConsumerfunction to acceptengine.Engineandclient.TxnClientparameters• Modified consumer creation calls to pass
additional parameters
sql.go
IVF-flat SQL utilities updated for SqlProcesspkg/vectorindex/ivfflat/sql.go
• Updated
GetVersionfunction to usesqlexec.SqlProcessinstead ofprocess.Process• Modified context handling and error messages
types.go
ISCP types interface updated for direct transaction handlingpkg/iscp/types.go
• Updated
DataRetrieverinterface to changeUpdateWatermarkmethodsignature
• Removed executor dependency in favor of direct context and
transaction parameters
metadata_scan.go
Metadata scan updated for SqlProcess interfacepkg/sql/colexec/table_function/metadata_scan.go
• Updated metadata scan to use
sqlexec.NewSqlProcess(proc)for SQLexecution
• Modified
RunSqlcall to use SqlProcess interfacefulltext.go
Fulltext table function updated for SqlProcesspkg/sql/colexec/table_function/fulltext.go
• Updated fulltext functions to use
sqlexec.NewSqlProcess(proc)•
Modified SQL execution calls to use SqlProcess interface
ivf_create.go
IVF create table function updated for SqlProcesspkg/sql/colexec/table_function/ivf_create.go
• Updated IVF creation functions to use
sqlexec.NewSqlProcess(proc)•
Modified version retrieval and SQL execution calls
iteration.go
ISCP iteration updated for new consumer interfacepkg/iscp/iteration.go
• Updated
NewConsumercall to includecnEngineandcnTxnClientparameters
watermark_updater.go
Watermark updater function made mockablepkg/iscp/watermark_updater.go
• Made
ExecWithResultfunction a variable for easier testing/mockingutil.go
Compile utility function error handling improvedpkg/sql/compile/util.go
• Updated
genInsertIndexTableSqlForFullTextIndexto return erroralongside SQL strings
• Added error handling to function signature
2 files
function_id_test.go
Remove HNSW CDC update function IDpkg/sql/plan/function/function_id_test.go
• Removed
HNSW_CDC_UPDATEfunction ID from predefined function IDs•
Updated
FUNCTION_END_NUMBERto reflect the removalfunction_id.go
HNSW CDC update function ID removedpkg/sql/plan/function/function_id.go
• Removed
HNSW_CDC_UPDATEfunction ID constant• Decremented
FUNCTION_END_NUMBERaccordingly1 files
iscp_util.go
ISCP utility functions for index CDC managementpkg/sql/compile/iscp_util.go
• Implemented ISCP (Index Sync Consumer Producer) utility functions
for CDC task management
• Added functions for creating, deleting, and
validating index CDC tasks
• Implemented job
registration/unregistration with proper error handling
• Added support
for different sinker types based on index algorithms
5 files